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Abstract. Microarrays are research tools used in gene discovery as well 
as disease and cancer diagnostics. Two prominent but challenging problems 
related to microarrays are the Border Minimization Problem (BMP) and the 
Border Minimization Problem with given placement (P-BMP). 

In this paper we investigate the parameterized complexity of natural vari¬ 
ants of BMP and P-BMP, termed BMP® and P-BMP® respectively, under 
several natural parameters. We show that BMP® and P-BMP® are in FPT 
under the following two combinations of parameters: 1) the size of the al¬ 
phabet (c), the maximum length of a sequence (string) in the input {i) and 
the number of rows of the microarray (r); and, 2) the size of the alpha¬ 
bet and the size of the border length (o). Furthermore, P-BMP® is in FPT 
when parameterized by c and £. We complement our tractability results with 
corresponding hardness results. 


1 Introduction 

DNA and peptide microarrays mm are important research tools used in gene discovery, 
multi-virus discovery as well as disease and cancer diagnosis. Apart from measuring the 
amount of gene expression [18j , microarrays are an efficient tool for making a qualitative 
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Figure 1 : Asynchronous synthesis of a 2 x 2 microarray. The deposition sequence V = CTAC 
corresponds to four masks A^i, Ad2, Ada, and Ad4. The masked regions are shaded 
and the border between the masked and unmasked regions is represented by bold lines. 


statement about the presence or absence of biological target sequences in a sample. For 
example, peptide microarrays are used for detecting tumor biomarkers [ainiiiH]. 

A microarray is a plastic or glass slide consisting of thousands of sequences of nu¬ 
cleotides called probes that are assigned to one cell in the array. The synthesis pro¬ 
cess m consists of two components: probe placement and probe embedding. In the probe 
placement, the goal is to determine an assignment of each probe to a unique cell of the 
array. If the placement is given one has to create the sequences at their respective cells 
(probe embedding). This can be achieved with help of the following two operations: It is 
possible to mask a certain set of cells. Furthermore, one can append a certain nucleotide 
to the probes in all those cells which are currently unmasked. Essentially, the nucleotides 
are represented as characters and the probes as strings. In probe embedding we want 
to find a common supersequence of all probes, called the deposition sequence, and a 
sequence of 2 D arrays describing the masks. The cells of a mask can be either masked 
(opaque) or unmasked (transparent) allowing the deposition of the nucleotide associated 
with the mask. For any cell, the concatenation of the nucleotides for which the cell is 
transparent has to match the probe in that cell of the microarray. See Figure [T] for an 
example m- 

Due to diffraction, the cells on the border between the masked and the unmasked 
regions are often subject to unintended illumination m, and can compromise experi¬ 
mental results. Therefore, unintended illumination should be minimized. The magnitude 
of unintended illumination can be measured by the border length of the masks used, 
which is the number of borders shared between masked and unmasked regions, e.g., in 
Figure[Tl the border length of Adi, Ada, A44 is 2 and Ad2 is 4 which yields a total border 
length of 10. 

The problem of finding both the placement and the embedding is termed the Border 
Minimization Problem (BMP). If the placement is given and the task is to find only the 
embedding, we speak of P-BMP. We refer the reader to Section [ 2 ] for formal definitions 
of BMP and P-BMP. 

Variants of border minimization. In this paper we consider the exhaustive vari¬ 
ants of BMP and P-BMP, termed BMP® and P-BMP® respectively. The difference is 
that in P-BMP® (and, consequently, in BMP®) we assume that a mask is always applied 
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Table 1: Overview of results. 


exhaustively (we call this the exhaustive rule). More precisely, when a mask that synthe¬ 
sizes a character c is applied, the mask has a transparent cell wherever the corresponding 
sequence begins with the character c. 

Without this assumption it is possible to artificially increase the length of the depo¬ 
sition sequence which, as a consequence, also increases the length of the sequence of 
masks. In most application scenarios this is undesirable, since applying a mask requires 
an additional cycle of work that causes a waste of material and can also introduce new 
errors. A second advantage of these exhaustive variants is that they allow the concise 
description of solutions: a solution to P-BMP'^ is fully characterized by the deposition 
sequence, while for P-BMP it is also necessary to explicitly describe each mask in the 
sequence. To clarify, we remark that an optimal exhaustive solution need not always be 
an optimal solution for P-BMP (or BMP): there are cases where the border length can 
increase. 

We illustrate the usefulness of the assumption by a simple example. In the P-BMP® 
instance o|6|a, this assumption indeed helps to reduce the number of masks without 
increasing the border length. A non-exhaustive optimal solution might work on the left 
a first, while an exhaustive optimal solution works on both a concurrently. Even though 
the border length is in both cases 4, the non-exhaustive case could require an additional 
mask. 

Our results. Our results are summarized in Table [TJ In this paper we investigate 
the parameterized complexity of the BMP® and P-BMP® problems under several natural 
parameters. First of all, throughout this work we consider the number of available nu¬ 
cleotides c (i.e., the alphabet size) as a parameter. Notice that this assumption does not 
impose a serious restriction, since in practice the number of available nucleotides is very 
limited (or even constant). Orthogonal to this assumption we explore the parameterized 
complexity of the BMP® and P-BMP® problem with respect to three natural parameters, 
i.e., the maximum length of a sequence in the array (£), the maximum border length 
cost (o), and the maximum number of rows in the array (r). Since errors become more 
likely as the length of the sequence grows, the length of the constructed probes will be 
rather limited. Notice that the parameter o models the cost of a solution and hence is 
also a natural parameter. Finally, with the maximum number of rows r the shape of the 
array is restricted in the sense that the one dimension does not grow arbitrarily. This is, 
in particular, interesting because it allows to generalize from the one-dimensional case 
studied in [El- 

More precisely, we show fpt-algorithms for BMP® and P-BMP® if we are given either 
c, r or c, o as parameters. We complement these results with parameterized intractabil- 
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ity results, i.e., by showing paraNP-hardness. We use a polynomial time reduction from 
P-BMP® to BMP® to build upon the result that P-BMP® parameterized by c and r is 
paraNP-har43 and obtain hereby paraNP-hardness for BMP® parameterized by c and r. 
Notice that with the exception of BMP® parameterized by c and i, we obtain a full 
parameterized complexity map of the two considered problems with respect to all addi¬ 
tional parameters considered in this paper. We furthermore provide a reduction relating 
the complexity of BMP® parameterized by c and i to /c-Balanced Partition on grids, 
a well-studied problem whose parameterized complexity on grids is open (Proposition [2D. 

The rest of the paper is organized as follows. In Section [2] we introduce the problems 
formally and give preliminaries. Then, in Section [Sj we show the reduction from P-BMP® 
to BMP®. Section SI introduces the fpt-algorithms and, finally, in Section [5] we present 
conclusions and open problems. 

2 Preliminaries 

For n € N, we use [n] to denote the set {1,..., n}. For two sequences si, S 2 , we use si • S 2 
to mark their concatenation. 

The microarray has size r x m, where r is the number of rows and m is the number 
of columns. The multiset of input sequences (also called probes) is denoted by 5 = 
{si, S 2 ,..., Sr-m} and the input alphabet by S. Moreover, let c = |S|. For any sequence 
Si, we denote the length of the sequence by ii and the t-th character of a sequence st by 
Si[t]. We use i for the maximum length of the probes, i.e., i = maxjgj^.^] £{. Two cells of 
the array vi = (xi, yi) and V 2 = (x 2 , 1 / 2 ) are said to be neighbors if |xi —X 2 | + |yi — 2 / 2 ! = 1- 
For each cell v, we denote the set of neighbors of v by J\f{v). 

In order to give the formal definition of BMP, we introduce several notions related to 
the synthesis process. 

Definition 1. A placement of the probe sequences is a bijective function (p that maps 
each probe sequence to a unique cell in the array. 

Definition 2. A deposition sequence D for a set of sequences 5 is a sequence of characters 
which is a common super sequence of all sequences in S. 

Definition 3. An embedding of a sequence Si into a deposition sequence D is a length-|D| 
sequence Si over alphabet S U {—} such that: 

1. £i contains precisely |sj| characters other than ” occurring at positions £i[ui], 

[^ 2 ]) • • • ) [^|si|]) 

2. ui is the minimum position such that £i[ui] = Si[l], 

3. for 2 < j < |si|, Uj is the minimum position such that ei[uj] = Sj[j] Uj-i < Uj. 


^Although in HZ! only NP-hardness is proven for P-BMP, the reduction can also be used to show 
paraNP-hardness for P-BMP® when parameterized by c and r. 
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Informally, £i captures how a sequence is built (or, equivalently, deleted) by the de¬ 
position sequence; notice that due to the exhaustive rule, the embedding is uniquely 
determined by the deposition sequence. An embedding of a set of probes S into a depo¬ 
sition sequence D is then denoted by £d = {ei, £ 2 , • • •, £|<s|}- Note that we will drop the 
subscript when the associated deposition sequence is clear from the context. The final 
key notion we need are masks. 

Definition 4. A mask Ai (for some character c) is a 2D-array such that A4{i,j) is 
either c or a space ” (here the space means that the character is not deposited into 
this cell). 

The sequence of masks associated with a deposition sequence D and a placement ip 
is a; = A4i,... where A4i{a,b) = e(^-i(a,fe)W for i G [|T)|]. Notice that due to the 

exhaustive rule, a mask for character c is always maximal with respect to c, i.e., there 
is no ” in the mask that could be replaced by c. We introduce now the border length 
of a given placement of the probes in the array, which is the value we aim to optimize. 

Definition 5. Let border/)(Sj, Sj) be the Hamming distance between e* and £j (with 
respect to deposition sequence D). The border length of a placement if and a deposition 
sequence D is then defined as the sum of borders over all pairs of neighboring probe 
sequences 

BL{ip,D)= ^ borderD(sj,Sj). (1) 

Vi, j S N : i < j < |<S| 

A tp(sj) e 

We can also equivalently define border length in terms of the border length of all the 
masks. 


Definition 6. For any mask Ai of deposition character x, the border length of Ai, 
denoted by BL(A4), is defined as the number of pairs of neighboring cells (ii,ji) and 
(^2)^2) such that Ai{ii,ji) = x and A\{ii,ji) / Ai{i2,j2)- For a placement and de¬ 
position sequence that corresponds to a sequence of masks Afi, Ai 2 , •••, A4|d|) we 
let 

\D\ 

BL{ip,D) = ^BLiAih) ( 2 ) 

h=l 


The BMP® and the P-BMP® problem are defined as follows. 


Problem 1. In the BMP® problem, we are given r, m G N and a multiset of r ■ m 
sequences S. The objective is to find a placement ip and a deposition sequence D so that 
BB{ip,D) is minimized. 

Problem 2. In the P-BMP® problem, we are given r, m G N and a multiset of r ■ m 
sequences S and a placement ip. The objective is to find a deposition sequence D so that 
BB{ip,D) is minimized. 
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For a set vr C {c, r, I, o}, we denote by BMP® (P-BMP®) the BMP® (P-BMP®) problem 
parameterized by vr. For a problem BMP® (P-BMP®) where o E vr, we assume that an 
upper bound on the border length o is additionally given in the input and only solutions 
with minimum border length < o are admitted. 

We conclude this section with some useful observations. A deposition sequence D is 
called redundant if it contains a character D[i\ such that £j[i] = ” for each £j E e. 

Note that for any redundant deposition sequence D and any placement ip, it holds that 
BL(</5,I?) = BL((/v,D'), where D' is obtained by deleting the redundant character D[i]. 
We say that a deposition sequence D is good if it is not redundant. 

Observation 1. Let {ip,D) be such that BL{ip,D) is minimized for some {S,r,m). If D 
is redundant, then there exists a subsequence D' of D such that BL((^, D') = BL((^, D) 
and D' is good. 

As a consequence, when searching for optimal solutions of these problems it suffices 
to consider only good deposition sequences. Aside from the trivial (quadratic) algorithm 
for computing the border length for a fixed deposition sequence and placement, we will 
utilize another algorithm which will in some cases yield better running times: 

Proposition 1. For any given {ip, D,S,r,m), there exists an algorithm which computes 
F>L{ip,D) in time 0(|5| + ■ \D\), where p is the number of distinct sequences in S. 

Proof. The algorithm proceeds in four steps. First, in time 0(|<S|) it finds all unique 
sequences in S and stores them in a set Q along with a mapping p : S ^ Q which maps 
sequences from S to their representative in Q. Second, in time 0{p^ ■ |T)|) it computes 
and stores border£)(gi,^ 2 ) for each qi,q 2 E Q. Third, in time 0(|<S|) for each sequence 
s E <S it computes the set Rg = ip~^{Af{ip{s))) of neighboring sequences. Finally, in time 
0(|<S|) it computes ^ border£)(? 7 (s), r/(r)) which is easily seen to be equal to 

Vs£<S,rGi?s 

BL{p>,D). □ 

2.1 Parameterized Complexity 

Parameterized algorithmics is a promising approach to obtain efficient algorithms for 
fragments of computationally hard problems. The aim is to find a parameter that de¬ 
scribes the structure of the instance such that the combinatorial explosion can be con¬ 
fined to this parameter. In a parameterized complexity analysis the runtime of an algo¬ 
rithm is studied with respect to the input size n and a parameter /c E N (or a combination 
of parameters). For a more detailed introduction we refer to the literature [11[9]. 

Formally, a parameterized problem is a subset of T,* x N, where S is the input alphabet. 
If a combination of parameters ki,...,ki is considered, the second component of an 
instance {x,k) is given by A; = class FPT {fixed-parameter tractable) 

contains all problems that can be decided by an algorithm running in f{k) ■ time, 
where / is a computable function and n is the input size. Such algorithms are often 
called fixed-parameter tractable (fpt). 
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Let Li and L 2 be parameterized problems, with Li C SJ x N and L 2 C S 2 x N. A 
parameterized reduction (or fpt-reduction) from Li to L 2 is a mapping P : x N ^ 
S 2 X N such that (1) (x, k) G Li iff P{x, k) € L 2 ; (2) the mapping can be computed by 
an fpt-algorithm with respect to parameter k] (3) there is a computable function g such 
that k' < g{k), where {x',k') = P{x,k). 

There is a variety of classes capturing parameterized intractability. For our results, 
we require only the class paraNP [8|, which is defined as the class of problems that 
are solvable by a nondeterministic Turing-machine in fpt-time. We will make use of 
the characterization of paraNP-hardness given by Flum and Grohe [9], Theorem 2.14: 
any parameterized problem that remains NP-hard when the parameter is set to some 
constant is para NP-hard. Showing paraNP-hardness for a problem rules out the existence 
of an fpt-algorithm under the usual complexity theoretic assumptions. 

3 Hardness 

In this section we overview and present new (parameterized) intractability results for 
BMP® and P-BMP® with respect to several combinations of parameters. As our starting 
point, we notice that the NP-hardness proof for P-BMP of Popa, Wong and Yung m 
can be straightforwardly adapted to P-BMP® 

Proposition 2 (cf. [T71 Theorem 1]). P-BMP®^ is para NP-hard. 

Proof. Observe that the reduction used in the proof of Theorem 1 in m constructs 
instances of BMP which only contain 3 characters. Furthermore, while the instances are 
formally defined as square arrays, all rows below the 5-th contain only a dummy character 
$ and hence can be omitted without loss of generality. Finally, by Lemma 2 in m 
it follows that optimal exhaustive solutions for these BMP instances are also optimal 
solutions (in fact, it is these exhaustive solutions that are used to prove Theorem 1 
in [IZ]). □ 

The hardness result for BMP® relies on a new polynomial-time reduction from P-BMP® 
to BMP®. We believe that this reduction is an interesting result on its own, as it is one of 
the first results that relates the complexity of these two problems in a general setting. We 
begin by showcasing a tool for forcibly “separating” any optimal deposition sequence. 

Lemma 1. Let X = {S,r,m) be an instance 0 /BMP® such that each s S consists 
of a prefix Spre G a fixed separator sep G {x*y*)* and a suffix Ssuf G ^*sup where 

Sprej Ysu/, {x, y} form a partition o/S. Let u > 8 • maxse^dsprel) + 8 • maxsg^dssu/l) + 1- 
If sep = (x”'"*'“ • then every optimal good deposition sequence has the form 

Dpre • sep • where Dp ^^ G Pip^-g and Dg^^j G 

Proof. Notice that r ■ m ■ u — 1 forms a trivial upper-bound on the border length of 
I, as witnessed by any deposition sequence of the form Dpre • sep ■ Dguf (regardless of 
placement). Indeed, there are at most 4r • m pairs of neighboring cells in the array, and 
for each such pair the border length is bounded by the hamming distance between the 
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embeddings placed on these cells, where any deposition sequence of this form yields a 
bound of 2 • maxsesi\spre\) + 2 • maxses{\ssuf\)- 

Consider any optimal good deposition sequence D and let p E ^pre, Q G ^suf- Con¬ 
sider for a contradiction that qp is a subsequence of D. Then pre ■ qp would also be a 
subsequence of D; however, each mask for a character in pre would yield an increase of 
the border length by at least 1, since the array contains a cell in the array where this 
mask cannot be applied (specifically, this is the cell containing the sequence beginning 
with p). This would already break the upper-bound provided above, hence qp cannot be 
subsequence of D. 

Next, consider for a contradiction that qy is a subsequence of D. Then sep ■ qy = 
^^r-m-UyV-m-uy-m-u _ qy g^^gQ g subsequence of D. This means that there exist two 

embeddings ei, £2 which differ in the positions of their first, second, third,... ,(r ■ m • u)^- 
th y characters. Let offset^ be the number of masks for x which occur between the 
position of the first y character in ei and the hrst y character in £ 2 ; notice that 0 < 
offset^ < r ■ m ■ u. Each mask for x in the offset has a border length of at least 1, since 
there is a sequence S 2 in the array which begins with y. If offset^ < r-m-u then the upper- 
bounded on the border length of D is broken by the fact that that occurs 

(r • m • u)-many times in succession in the deposition sequence, and each occurrence would 
necessarily increase the border length by at least 1. On the other hand, if offset^ = r-m-u 
then the upper-bound on the border length would be broken already by all the masks 
for X which occur in the offset. 

By a symmetric argument, we obtain that xp also cannot occur as a subsequence of 
D. Hence the deposition sequence must have the form Dpre ■ sep - Dguf- □ 

Observe that “flipping” the array horizontally or vertically preserves the optimal bor¬ 
der length but formally changes the placement (p. The purpose of the following key 
lemma is to provide a tool to fix the optimal positions of probes in the array; to this 
end, we will be considering placements which are unique up to these simple symmetries. 

Lemma 2. Let a,b,x,y E S and r,m,t E N. Consider an r x m array, and probes 
S = {a*'* • sep - I * S H o,nd j E [m]}. Then: 

1. the unique optimal placement pQ (up to simple symmetries) places each probe a*'* • 
sep - in cell {i,j), 

2. the unique optimal good deposition sequence is Dq = oL'^ - sep - ft'”'*, and 

3. for any placement p ^ pQ (except for symmetries of po) and any deposition se¬ 
quence D, it holds that BL((/7, D) > BL((^ 0 ) Dq) 1. 

Proof. We proceed in two steps. First, we compute the border length of {pQ, Dq). Then, 
we establish that po is the only optimal placement up to the above-mentioned simple 
symmetries, and that other placements yield a border length which is lower-bounded by 
t + BL((^ 05 Notice that Dq is the only optimal good deposition sequence regardless 
of placement by Lemma [H 

Claim. BL((/Jo, Dq) = {{r — 1) - m + r - {m — 1)) -1. 



Figure 2: An r x m array. The corners and the perimeter are highlighted in gray. 


Proof of Claim: For character a, we start with t-many masks that contain character a 
in each cell. Notice that these masks have border length zero. Then we continue with 
t-many masks that have character ” in the first row and character a everywhere else. 
Each of these masks has border length m. Next we use t-many masks, where the hrst two 
rows contain character and so on. In total, we obtain a border length of (r — 1) -m-t 
for character a. For character x and y, all masks contain character x or ?/ in each cell and 
hence all have a border length of zero. Finally, for character b the procedure is analogous 
- we simply swap columns and rows. This gives a border length of r • (m — 1) • t for 
character b. ■ 

Now consider any optimal solution {ip, D). The fact that D = Dq follows from Lemma 
m We now proceed to the core of our proof. Notice that for each pair of probes si, S 2 ^ S 
it holds that border£)(j(si, Sj) > t. We say that si, S 2 are similar if border£)p(si, Sj) = t. 
Since the number of pairs of cells which are neighbors in an r x m array is exactly 
(r — 1) • m + r • (m — 1) and BL((/Jo, Dq) = {{r — 1) ■ m + r ■ (m — 1)) • t, any optimal 
placement (p may only place probes which are similar into neighboring cells. Furthermore, 
if a placement tp is not optimal, then BL((^, Z?) > t + BL((/7ojZZo) since for any si,S 2 
which are not similar it holds that border£)p(si, Sj) > 2t. 

Let us denote the cells which have at most 3 neighbors in the array the perimeter 
and the cells which have at most 2 neighbors the corners. For the final part of the 
proof, we use the inductive assumption that cpo is the unique optimal placement for all 
r' X m' arrays such that r' < r and m' < m as long as the placement of at least two 
corners is fixed. Furthermore, we assume that min{r,m) > 1; the lemma trivially holds 
for min{r, m) = 0, and is easily seen that min{r, m) = 1 the optimal placement must be 
an ascending sequence, which is unique if its corners/endpoints are fixed. 

For each s G 5, let sim{s) denote the set of probes which are similar to s. Notice that 
there are precisely four probes such that |sfm(s)| = 2 and precisely 2r + 2m — 4 probes 
such that |sfm(s)| = 3, and there is a unique (up to symmetry) placement of these probes 
in the corners and perimeter so that similar probes are placed on neighboring cells (see 
Fig. [2]). Let Sq contain all the probes placed into the perimeter. 

Notice that the placement of these probes on the perimeter precisely matches (pQ, and 
the placement of probes such that |sfTO(s)| < 2 in 5' = 5 — <So is fixed by the placement 
of <So in the perimeter. 

If min{r, m) = 2 then this concludes the proof. If min{r, m) = 3 then the remaining 
placement reduces to the placement of 5' = S — Sq into a one-dimensional array, which is 
unique when the corners are fixed. Finally, if min{r, m) = 4 then the remaining placement 
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reduces to the placement of S' into an (r — 2) x (m — 2) array, which is again unique by 
our inductive hypothesis. □ 


With Proposition [2] and Lemma [21 we can proceed to: 


Theorem 1. BMP®^ is paraNP-hard. 

Proof. We provide a reduction from P-BMP®^, which is paraNP-hard by Proposition |21 
Let S' be the language of P-BMP®^, xi,yi,X 2 ,y 2 0 S' and S = S' + {xi,yi, X 2 ,y 2 }- 
From any instance I' = {S' , cp', r, m) of P-BMP® we construct an instance X = (5, r, m) 
of BMP® ^ as follows. For each s ^ S' such that ip'{s) = (i, j) we put a*'* • sepi ■ V''" ■ sep 2 ■ s 
into S, where: 

• t > (maxses'lsj ■ r • m)^. 


• sepi 

• sep2 


r-m-ui r-m-ui' 

jr-m-tii 

T Vl 

r-m-U2 r-m-U2' 


2 y2 


• the constants ui,U 2 for sepi and sep 2 respectively are sufficiently large so as to 
satisfy the condition of Lemma (H for instance, U 2 > lOOt^ and ui > lOOOt^. 

By Lemma [T] we have that any optimal good deposition sequence for I must have 
the form a'’'“ • sepi ■ 6'”'“ • sep 2 ■ D'. Let us now compare an arbitrary solution ((^, D) to 
(y?', D). By Lemma 121 either tp is equivalent to p' by symmetry, or the border length of 
masks for a, xi,yi,6 in {(p,D) will be at least t greater than the border length of these 
masks in {ip',D). However, t was chosen to be sufficiently large to exceed the worst-case 
border length of all masks for S'. So we conclude that any optimal solution for I must 
use a placement which is either the same as or symmetric to p'. 

Finally, observe that after the last mask of sep 2 is applied, the remainder of X is 
equivalent to X', and hence D' is also a solution to X'. □ 


Theorem [T] and Proposition |2] show that one cannot hope to hnd an fpt-algorithm for 
BMP® or P-BMP® parameterized by any subset of {c,r}. These results complete the 
hardness part of our complexity map for BMP® or P-BMP®. For BMP®^ it remains open 
whether the problem is fixed parameter tractable. Still, we can relate this problem to 
/c-Balanced Partition, a problem studied well in the literature [I10E]. 

In a /c-Balanced Partition instance we are given a graph G = { V , E) with \V\ = n . 
The question is to find a partition of the vertices V into k sets Vi,..., 14 such that 
|hi| < [fl all 1 < i < A;, and the cut size (i.e., the number of edges {x,y} such 
that X £ Vi , y £ Vj , and i / j) is minimized. We remark that, to the best of our 
knowledge, the parameterized complexity of /c-Balanced Partition parameterized by 
k is open on solid rectangular grids [5]. Below we show that /c-Balanced Partition 
on solid rectangular grids can be reduced to BMP® and hence BMP® is at least as hard 
as /c-Balanced Partition. 


Proposition 3. There is a polynomial time reduction from A:-Balanced Partition 
on solid reetangular grids to BMP®. 
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Proof. Let G = {V,E) be a solid rectangular grid of size r xm with |1L| = ri. Further, 
let l,x & Nq such that n = I ■ k + x and S = {ci,... ,Cfc}. We construct a probe set 
S = {/(i) copies of sequence Cj | i € [/]}, where function f{i) = l + l\il<i<x and 
f{i) = I otherwise. It is easy to verify that the placement of a BMP*^ solution gives 
also a solution to A:-Balanced Partition if the characters in E are seen as partition 
sets Pi, 14 . □ 

4 Fpt-Algorithms 

In the following sections we discuss fpt-algorithms for several parameters. The hrst group 
focuses on sequences of moderate length and an array whose size is primarily growing 
in one dimension, i.e., on the parameters c, i, and r. In contrast, the second group 
parameterizes by c and the maximum admissible border length o. 

4.1 Fpt-Algorithm for P-BMP®^ 

Our first algorithm provides a basic introduction to the techniques used later on. 

Observation 2. For any instance {S,r,m) o/BMP®^, there are at most unique 
sequences in S. 

Lemma 3. For any instance (S, r, m) o/BMP® ^ or any instance (<S, ip, r, m) o/P-BMP® ^ 
it holds that \D\ < c^ ■ I for any good deposition sequence D. 

Proof. Assume towards contradiction that there is a good deposition sequence D which 
contains \D\ > c^ ■ £ characters. Since the total number of distinct sequences s* € 5 is 
bounded by c^, the total number of distinct embeddings is also bounded by c^. Each 
embedding £i contains at most i characters in S\{—}. Hence by the pigeon-hole principle 
there must exist some j G [|D|] such that £i\j] = ” for all i € [|<S|], which implies that 

D is not good (contradiction). □ 

At this point we can already prove: 

Proposition 4. P-BMP®^ is fixed parameter traetable, and there exists an algorithm 
for P-BMP®^ which runs in time c^ |<S|. 

Proof. By Lemma El it suffices to search for deposition sequences of length at most c^ ■£. 
We loop through all of the at most c^ such deposition sequences, and for each sequence 
D we compute 'SL{p,D) in time 0(|<S| + p^ ■ |L)|) by Proposition [H By Observation [2] 
and LemmaEl we obtain that 0(|5| + p^ ■ |D|) = 0(|<S| + c^^tj, which altogether yields 
the runtime bound of |5|. □ 
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4.2 Fpt-Algorithm for 

We first introduce some notation for our arrays. Given an r x m array A, a column is an 
r X 1 sub-array of A. A column placement into a column of A is a mapping <y9 : [r] —>■ 5 
from the cells of A to the multiset of probes. 

Observation 3. For any instance {S,r,m) o/BMP®, it holds that there are at most 
distinct column placements. 

Hence for any fixed r and S, we can enumerate all possible column placements as 
(fi, ■ ■ ■, Observe that, for any two column placements ipf, it holds that either 
(i) t = t' and ipt{x) = for all x G [r], or (ii) t ^ t' and (pt{x) ^ for at least 

one X G [r]. 

Any placement (/? : s G <S i-)- (a G N, 6 G N) into A can be uniquely decomposed 
into a sequence of column placements ■ ■ ■ Fi{rn)) where iPi(x){y) = 

and i : [m] —>■ The column placement with j G [m] denotes that the j-ih. 

column of A is of placement i{j). Furthermore, since (/? is closed under permutation of 
non-distinct sequences in 5, each column placement can be uniquely identified by an 
r-tuple of sequences from <S, formally = (•si) '§ 2 ,..., Sr) Fi{x){y) = Sy for all 

y G H- 

Next, we prove that when searching for optimal solutions for BMP® it suffices to restrict 
ourselves to placements such that identical column placements appear in “consecutive 
blocks”. 

Lemma 4. Let {S, r, m) be an instance of BMP®, D be a deposition sequence and ip 
he a placement which decomposes into {'Pi[i),iPi( 2 )-, ■ ■ ■ Fi(m))■ Then if there exist a,b € 

[m], a + l <b, such that ipn^a) = Fi(b) but (pi(a+i) / Fi{b), then BL{ip,D) > BL{ip',D), 
where ip' decomposes into 

) • • • Fi{a) iFi{b)': Fi{a+1) j Fi{a+2) >■■■•> Fi(b—1) j Fi(b+1) >■■■•> Fi{m )) • 

Proof. Recall that by Equation [H BL((/?, D) is equal to the sum of Hamming distances 
of embeddings border£)(sp, Sg) between neighboring Sp,Sg G S. Since the embeddings, 
and hence also the Hamming distances, are the same for B\j{ip, D) as for BL(<y9', D), the 
only difference between these values may arise from which sequences are neighbors. 

We say that two neighboring cells vi = (xi,?/i) and V 2 = {x 2 ,y 2 ) are x-neighbors if 
\xi — X 2 \ = 1 and y-neighbors otherwise, i.e., if |yi — 2 / 2 ! = 1; let A/’a;(r’) andA/'y(u) contain 
the x-neighbors and y-neighbors of u, respectively. Notice that ^-neighborhoods are iden¬ 
tical between ip and ip', since the latter is obtained by permuting whole columns of the 
former. On the other hand, consider the difference between x-neighboring sequences in ip 
and ip'. Notice that ip' is obtained by a simple permutation of the column placements of ip 
and in particular these differ only in the borders between {Pi^a) ^ Fi{a+i )) Pi{b-i) ^ Fi{b) ^ Pi{b+i)}- 
For convenience, we use bd to denote the total “horizontal” border between two column 
placements; formally: 

bd(u,t) = E border £1 (x), ipi(^t) (x))- 

Vx € [r] 
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Now we can express the difference between the border lengths of both placements as 
D) = BL((^, D)+hd{a, 6)+bd(6, a+l)+bd(6—1, 6+1)—bd(a, o+l)—bd(6—1, b) — 
bd(6, 6+1). Since ipi(^a) = if holds that bd(a, 6) = 0 and bd(6, a + 1) = bd(a, a + 1). 
Furthermore, since the triangle inequality holds for Hamming distances (and border 
is dehned as a Hamming distance between two sequences), we obtain bd(6 —1,6 + 1) — 
bd(6 — 1, 6) — bd(6, 6 + 1) < 0. Hence we conclude that BL((^', D) < BL((/7, D). □ 

We say that a placement (p is consecutive if it decomposes into column placements 
(+i(i), +i( 2 ), • • • +i(m)) where for each Pi^a), V>i{b) such that pi(^a) = Pi(b) and a < 6 it holds 
that Pi(a) = Pi{c) for all a < c < 6. 

Corollary 1. For any BMP® instance {S,r,m), there exists an optimal solution {p,D) 
such that p is consecutive. 

Proof. Let {p', D) be a solution for {S,r,m). We can repeatedly apply Lemma [H until 
we obtain a consecutive placement—notice that the number of times Lemma S] can be 
applied is bounded by m. □ 

The next algorithm uses an Integer Linear Programming (ILP) subroutine. ILP is a 
well-known framework for formulating problems and a powerful tool for the development 
of fpt-algorithms for optimization problems. In following we only give a brief overview 
of the framework before we present the algorithm. 

Definition 7 (p-Variable Integer Linear Programming Optimization). Let A E 6 E 
^gxi g g Z^^^. The task is to find a vector x E Z^^^ which minimizes the objective 
function c x x and satisfies all q inequalities given by A and 6, specifically satisfies 
A ■ X >b. The number of variables p is the parameter. 

Lenstra m showed that p-ILP, together with its optimization variant p-OPT-ILP 
(defined above), are in FPT. His running time was subsequently improved by Kannan 
m and Frank and Tardos m (see also 0 )- 

Theorem 2 ( [71 [TTl[T3lHI] ). p-OPT-ILP can he solved using ■ L) arithmetic 

operations in space polynomial in L, L being the number of bits in the input. 

We are now ready to prove the main theorem of this subsection. 

Theorem 3. BMP®^^ is fixed parameter tractable, and there exists an algorithm for 
BMP®^j, which runs in time c® • |iS|. 

Proof. We give a multi-step algorithm for BMP® ^ 

1. We branch on the choice of deposition sequence D. By Observation [H it suffices 
to consider only good deposition sequences, and by Lemma [3| the number of good 
deposition sequences is bounded by . 
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2. In view of Corollary[Tl we branch on which column placements appear in (p and the 

order in which they appear. Formally, we construct the set of all distinct column 
placements T = {</?i,... branch on all nonempty subsets T' C T. We then branch 
on all mappings f ■ [t] ^ [|T|] where t = \T'\. Since |T| < by Observation [3l 
there are at most choices of /. 

For each hxed /, we hence obtain a template Qf = • • • j A consecu¬ 

tive placement p matches a template Q/ if there exists a multiplicity function h : t —>■ N 
such that p decomposes into (/i(l) • pf(^iph{2) ■ pfp 2 .)-,----:h{t) ■ Pf(t)) where x ■ pz is 
shorthand for x consecutive copies of pz- 

3. We compute the following constants: 

• For each column placement pi = (si, S 2 , • • •, Sr) G T' we compute the total 
cost of its “vertical borders” bdj'®”* as follows: 

bd^ert ^ ^ borderD(s^,S2+i). 

Vz G [r — 1] 

• We also compute the total “horizontal cost”, which depends only on D and 
Qf (since identical column placements do not have horizontal borders), as 
follows: 

cost/,= ^ holder D{p~fl^){z),p~l^^^^{z)). 

Vz G [r],ui G [t — 1] 

• For each distinct s € iS let ffs contain the number of occurrences of s in S. 

• For each distinct s & S and pi let ff \ contain the number of occurrences of s 
in Pi. 

4. We construct and solve an p-OPT-ILP instance I to compute the multiplic¬ 
ity function h which contains the “vertical cost” variable cost^, the variables 
/i(l),... , h[t) and the following constraints: 

a) For each distinct s € S: = E h{z) ■ 

Vz G [t] 

b) \/z G [t] : h{z) > 0. 

c) cost^ = ^ h{z) ■ . 

Vz G [t] 

d) Minimize cost„. 

The intuition of the constraints is as follows. Constraints of type a) ensure that 
the choice of multiplicities does not introduce too many/too few occurrences of 
some probe s in the array. By the constraints of type b) it is ensured that the 
multiplicities are strictly positive. With help of constraint c) the vertical border 
cost for a certain choice of multiplicities is computed, which is in turn minimized 
by constraint d). 
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5. Finally, for each choice of D, T' and / we store cost^ + cost/^ and the table of 
values h = (/i(l),... , h{t')) from the optimal solution of I. After the branching 
is complete, we choose an arbitrary branch with minimum cost^ + cost/j and read 
the values D, /, h associated with this branch. The algorithm then outputs ((^, D) 
where (p is computed from the template Qf given by / and the multiplicity function 
given by h. 

Running time. The number of branches processed after Step 1 and Step 2 is bounded 
by and this branching can be initialized in 0(|iS|) time. Step 3 

and the construction of I can both also be completed in linear time, assuming multisets 
are implemented via a multiplicity function. Z contains t < c^'*’ variables and has size 
linear in S, and can thus be solved in time at most c'^ ■ |<S| by Theorem [2J The time 

required to process Step 5 is easily seen to be dominated by Step 1 and 4. 

Correctness. Assume for a contradiction that the algorithm outputs {<p, D) but there 
exists an optimal solution {p',D') such that BL((/j',Z?') < JZL{(p,D). Consider the tem¬ 
plate Q'j and multiplicity function h' associated with p'. During the computation of our 
algorithm, the branch of Q'j and D' had correctly computed the cost'^ component of 
BL(</5', D'). Furthermore, since {p' , D') is optimal, we obtain that h' must be an optimal 
solution for the p-OPT-ILP instance X' constructed for this branch; let cost!^ be the 
output of X'. Then BL((p', D') = cost'„ -|- cost';^ implies that cost'„ -|- cost';^ < cost^ -|- cost/j, 
which contradicts the assumed choice of branch D and Qf in Step 5. □ 

4.3 Fpt-Algorithm for P-BMP®^ 

Given an r x m array, a mask A4 is called trivial if Af(i, j) ^ ” for all i G [r], j G [m]. 

Given a deposition sequence D, we say that a subsequence D' of D is primal if it is 
obtained from D by deleting all characters which are associated with a trivial mask. 
Notice that the border length of each mask associated with each character in a primal 
sequence is at least one, and the border length of all trivial masks is 0. For the purpose 
of providing concise running times, we use n to denote the size of the input. 

Observation 4. For any instance of P-BMP® and BMP®, the number of primal se¬ 
quences is bounded by Yli=i F < o ■ c°. 

Additionally, since the number of “borders” between distinct probes is bounded from 
below by the number of distinct probes, we obtain: 

Observation 5. Given a multisets of probes. For any YES-instance o/P-BMP® and 
BMP® over S, the number of distinct probes in S is upper-hounded by o -\-l. 

Lemma 5. For any instance o/P-BMP® and BMP®, any primal sequence D' corresponds 
to at most one good deposition sequence D. Furthermore, there exists an algorithm which 
runs in time 0{o-n) and which either computes this D from D' or correctly outputs that 
no such D exists. 
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Proof. We provide the polynomial time algorithm to compute D from D'; uniqueness 
follows by the fact that the algorithm is deterministic. 

Algorithm (I?') 

1 {i := 1) 

2 Check whether a trivial mask for any character x can be applied. 

3 If not, go to 5. 

4 If yes, apply it, set D := D + x, and go to 2. 

5 Apply the mask for D'[i]. Set D := D + D'[i]. 

6 i := i + 1. 

7 If (i < \D'\) then go to 2. 

8 Check whether a trivial mask for any character x € S can be applied. 

9 If not, go to 11. 

10 If yes, apply it, set D := D + x, and go to 8. 

11 If there remains a nonempty probe s, then reject. 

12 Output D. 


The algorithm runs in time 0(|D'| • (c+15|-maxsg^lsl)) = 0{o-n). Correctness follows 
from the definition of primal sequences. □ 

Theorem 4. P-BMP® ^ is fixed-parameter tractable, and there exists an algorithm for 
P-BMP® Q which runs in time 0{oc° ■ (n + o^)). 

Proof. This algorithm builds upon Observation HI We can branch on all primal sequences. 
For each candidate sequence D' we check whether the primal sequence corresponds to a 
deposition sequence D via Lemma [5j For each such D, we compute and store BL(<^, D). 
Finally, a solution with a minimum BL((^, D) is selected. Observe that an applicable 
trivial mask can be found in linear time. Along with Observation [5l this yields a total 
runtime of 0{oc° ■ {n + o^)) by Proposition [1] and LemmaO □ 

4.4 Fpt-Algorithm for BMP^^ 

For a multiset S and s € <S, we denote by S~^ the set of sequences in <S which are distinct 
from s. An instance {S,r,m,o) of BMP^^ is then called s-enveloped if |5“®| < o^. 

Lemma 6. Any instance {S,r,m,o) o/BMP®q such that r > o and m> o which is not 
s-enveloped for any s £ S is a no-instance. 

Proof. Consider any placement ip. For s £ S, we say that a column (or row) is s-uniform 
(w.r.t. (f) if all cells in the column (or row) are only assigned sequences which are not 
distinct from s. Furthermore, we say that a column (or row) is uniform if all cells in the 
column (or row) are not distinct from some sequence in S. 

Each non-uniform column and each non-uniform row contains at least one tuple of 
neighboring distinct sequences, which (regardless of D) contributes to an increase of 
P>L{ip,D) by at least 1. Hence any solution {ip,D) of {S,r,m,o) must contain at most 
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o rows and at most o columns which are not uniform. Furthermore, if there exists an 
s-uniform column (or row) for some s € <S, then all other uniform columns (rows) must 
also be s-uniform—otherwise would contain more than o non-uniform rows (columns), 
which we have already argued cannot happen. 

To complete the proof, consider the possible cells where a sequence which is distinct 
from s may appear. Clearly such sequences may only appear in the at most o non- 
uniform columns and in the at most o non-uniform rows, and these intersect in at most 
6^ cells. □ 

We now consider two specihc subcases of the problem before giving the theorem. 

Lemma 7. There is an algorithm which solves any instance {S, r, m, o) of BMP® „ such 
that m > 2o and r > 2o in time 0{o^ ■ c° ■ {n + o^))- 

Proof. By Lemma [6l there is either a sequence s € <S which represents the majority of 
sequences in S, or (5, r, m, o) is a no-instance; since only at most one quarter of sequences 
in S are distinct from s, the sequence s is unique and can be computed in time |<S|. 

Next, by Corollary [1] (and the symmetric statement for rows), we can assume without 
loss of generality that all s-uniform columns and all s-uniform rows are placed consec¬ 
utively in if. Notice that in this case only the first and last o columns and rows can be 
non-s-uniform. Since any sequence q distinct from s can only be placed in columns and 
rows that are not s-uniform, the number of possibilities for p{q) is bounded by 4o^. 

We now summarize the algorithm. First, we find s in time |iS|. Second, for each of the 
at most sequences q distinct from s we branch on the at most 4o^ possible values of 
p{q), resulting in a placement p. Third, for each such choice of p we use the algorithm 
for P-BMP® Q from Theorem [4] to find an optimal deposition sequence D and store the 
obtained BL(</5,1?). Finally, we choose a tuple {<p,D) with a minimum BL{p,D). The 
bound on the running time follows from Theorem HI □ 

Lemma 8. There is an algorithm which solves any instance {S,r,m,o) o/BMP®^ such 
that m > 2o and r <2o in time n ■ . 

Proof. By Observation [5l we obtain that the number of distinct column placements is 
bounded by o'’ < o^°. 

Now we reuse the algorithm given in the proof of Theorem [3] with the only difference 
that in Step [U we branch on primal sequences and compute the corresponding (good) 
deposition sequence in polynomial time. The number of primal sequences is bounded 
by o • c® (Observation HI), the time required to compute the corresponding deposition 
sequence is bounded 0{o ■ n) by Lemma HI For each hxed deposition sequence, the 
running time of steps 2-4 of the algorithm in Theorem [3] is bounded by c°°^°\ and hence 
the runtime bound of o^° ■ [o ■ n + n ■ ) = n ■ . □ 

Theorem 5. BMP®^ is fixed parameter tractable, and there exists an algorithm for 
BMP® Q which runs in time n ■ . 
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Proof. In case m > 2o and r > 2o we use the algorithm described in the proof of 
Lemma [71 In case m > 2o and r < 2o (or, by symmetry, if m < 2o and r > 2o) we 
use the algorithm described in the proof of Lemma [HI In case m < 2o and r < 2o we 
branch over all of the at most (4o^)! placements (^, resulting in at most (4o^)! instances 
of P-BMP® Q which can be solved individually in time 0{oc° • (n + o^)) by Theorem 01 □ 

5 Conclusion 

In this work we considered the parameterized complexity of BMP® and P-BMP®, two 
fundamental problems related to the optimal design of microarrays, with respect to 
combinations of parameters centered around the number of distinct characters c. We 
presented fpt-algorithms for both BMP® and P-BMP® if the maximum probe length and 
the number of rows are viewed as additional parameters (c, i, r); and if the border length 
is the additional parameter (c, o). In addition, we showed that P-BMP® parameterized by 
c and i is in FPT. For c, r (and also c alone) we showed paraNP-hardness for both BMP® 
and P-BMP®. Hence, under the usual complexity theoretic assumptions, one cannot hope 
to find an fpt-algorithm for these settings. 

On our agenda for future work is to settle the question whether there is an fpt- 
algorithm for BMP®, parameterized by c,i. Another direction for future research is to 
study further (structural) parameters for these two problems. Furthermore, in our com¬ 
plexity analysis we plan to consider more sophisticated target functions that take other 
criteria in addition to the border length into account. 
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